Attribute Selection by Measuring Information on Reference Distributions
نویسنده
چکیده
A great number of services, experiments, and decisions at Yahoo! require analyzing rich data sources. This data almost invariably holds a large number of attributes. In these scenarios, the efficient selection of relevant attributes is imperative for data analysis (e.g.,modeling, prediction). When approaching new data analysis tasks, domain experts, researchers, and engineers spent a considerable amount of resources identifying (manually or semi-automatically) these relevant attributes. This paper attempts to address this problem by providing a simple and largely automated attribute selection approach. The method is based on reformulating the mutual information (MI) measure. We show why MI cannot in general be used effectively without considerable domain expertise and describe a more appropriate measure that allows for a much larger level of automation (removing considerablemanual work from the analysis loop). Experiments on the tasks of predicting clicks and conversions for Yahoo! display advertising platform in the context of the NGDStone project show the effectiveness of the proposed approach.
منابع مشابه
Multiple attribute decision making with triangular intuitionistic fuzzy numbers based on zero-sum game approach
For many decision problems with uncertainty, triangular intuitionistic fuzzy number is a useful tool in expressing ill-known quantities. This paper develops a novel decision method based on zero-sum game for multiple attribute decision making problems where the attribute values take the form of triangular intuitionistic fuzzy numbers and the attribute weights are unknown. First, a new value ind...
متن کاملAn analysis of cultural land use spatial distributions using geographic information system in District 3 of Tehran Municipality
This research has been conducted with the aim of analyzing the spatial distribution of existing cultural land use in district 3 of Tehran Municipality and proposal for new site selection for cultural land use in this district. After studying previous researches on the issues of land use and site selection, 13 indicators for locating cultural land use were identified then by using distance mappi...
متن کاملDesigning a model of intuitionistic fuzzy VIKOR in multi-attribute group decision-making problems
Multiple attributes group decision making (MAGDM) is regarded as the process of determining the best feasible solution by a group of experts or decision makers according to the attributes that represent different effects. In assessing the performance of each alternative with respect to each attribute and the relative importance of the selected attributes, quantitative/qualitative evaluations ar...
متن کاملTriangular Intuitionistic Fuzzy Triple Bonferroni Harmonic Mean Operators and Application to Multi-attribute Group Decision Making
As an special intuitionistic fuzzy set defined on the real number set, triangular intuitionistic fuzzy number (TIFN) is a fundamental tool for quantifying an ill-known quantity. In order to model the decision maker's overall preference with mandatory requirements, it is necessary to develop some Bonferroni harmonic mean operators for TIFNs which can be used to effectively intergrate the informa...
متن کاملOptimal Budget-constrained Sample Allocation for Selection Decisions with Multiple Uncertain Attributes
Title of Dissertation: OPTIMAL BUDGET-CONSTRAINED SAMPLE ALLOCATION FOR SELECTION DECISIONS WITH MULTIPLE UNCERTAIN ATTRIBUTES Dennis D. Leber, Doctor of Philosophy, 2016 Dissertation directed by: Professor Jeffrey W. Herrmann Department of Mechanical Engineering A decision-maker, when faced with a limited and fixed budget to collect data in support of a multiple attribute selection decision, m...
متن کامل